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KVALI'ATION 


The  necessity  fur  more  complex  software  systems  In  such  areas  as 
command  and  control  and  avionics  lias  led  to  the  desire  for  better 
methods  for  predicting  software  errors  to  Insure  that  software 
produced  Is  of  higher  quality  and  of  lower  cost.  This  desire  has  been 
expressed  In  numerous  Industry  and  Government  sponsored  conferences, 
as  well  as  In  documents  such  as  the  Joint  Commanders'  Software 
Reliability  Working  Croup  Report  (Nov  1975).  As  a result,  numerous 
efforts  have  been  Initiated  to  develop  and  validate  matliemat leal 
models  for  predicting  such  quantities  as  the  number  of  remaining 
errors  In  a software  package,  the  time  to  achieve  a desired 
reliability  level,  and  a measure  of  the  software  reliability.  However, 
early  efforts  have  not  produced  models  with  the  desired  accuracy  of 
prediction  and  with  the  necessary  confidence  limits  for  general  model 
usage. 

This  effort  was  Initiated  in  response  to  this  need  for  developing 
better  and  more  accurate  software  error  prediction  models  and  fits 
Into  the  goals  of  KADC  TPO  No.  5,  Software  Cost  Reduction  (formerly 
RAPC  TPO  No.  11,  Software  Sciences  Technology),  In  the  suhthrust  of 
Software  Quality  (Software  Modeling).  This  report  summarises  the 
development  of  classical  and  Bayesian  estimates  for  parameters  of  a 
model  for  predicting  quantities  such  as  the  expected  number  of 
remaining  errors,  achieved  reliability,  and  time  to  detect  and  correct 
a specified  number  of  errors  that  assumes  a software  error  Is  not 
corrected  at  a given  time  with  probability  1 (i.e.  imperfect 
debugging).  The  importance  of  this  development  is  that  It  represents 
the  first  attempt  to  develop  software  error  prediction  models  that 
incorporate  imperfect  debugging,  and  thus  more  closely  reflect  the 
actual  software  error  detection  and  correction  process. 

The  theory  and  equations  developed  under  this  effort  will  lead  to  much 
needed  predictive  measures  for  use  by  software  managers  In  more 
accurately  tracking  software  development  projects  In  terms  of  test 
time  needed  to  achieve  given  reliability  and  error  objectives.  In 
addition,  the  associated  confidence  limits  and  other  related 
statistical  quantities  developed  under  this  effort  will  insure  more 
widespread  use  of  these  modeling  techniques.  Finally,  the  predictive 
measures  and  equations  developed  under  this  effort  will  be  applicable 
to  current  Air  Force  software  development  projects  and  thus  help  to 
produce  the  high  quality,  low  cost  software  needed  for  today's 
systems . 

QQom  n JLX4- 
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1.  INTRODUCTION 

In  this  report  we  present  two  methods  for  statistical  infer- 
ence of  the  parameters  of  the  imperfect  debugging  model  proposed 
by  Goel  and  Okumoto  (2).  The  first  one  is  the  classical  approach 
based  on  maximum  likelihood  estimation  and  the  second  is  a Bayesian 
approach  based  on  the  prior  distributions  of  the  unknown  parameters. 
The  parameters  under  consideration  are  the  initial  number  of  soft- 
ware errors,  N , the  error  occurrence  rate  for  each  error  X , and 
the  probability  of  perfect  debugging  p . The  probability  of 
imperfect  debugging  is  q where  q = 1-p  . 

The  model  in  [2]  is  based  on  the  assumption  that  the  time 
between  software  errors  follows  an  exponential  distribution  with 
parameter  i\  where  i is  the  number  of  remaining  errors.  Also, 
the  error  removal  time  is  taken  to  be  negligible.  By  letting  X(t) 
denote  the  number  of  errors  remaining  at  time  t , the  stochastic 
behavior  of  X(t)  is  analyzed  as  a semi-Markov  process  and  the  one 
step  transition  probability  from  state  i to  state  j is  given  by 


OijlC)  - Pij-Fjlt)  . 


(1.1) 


where 


F^t) 


1 - 
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-i\t 


(1.2) 


and  the  transition  probabilities 
by 


0.1.2, 
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Substitutina  (1.2)  and  (1.3)  in  (1.1)  yields 


(1.3) 


1 pF^t)  qF^t)  0 - 

2 0 pP1(t)  qF  2 ( t ) 


PFN-l(t)  qFN-l(t) 


PF„(t)  qF„(t) 


From  the  basic  model  (1.4),  expressions  for  the  following 
quantities  have  been  derived  in  (2)  . 

• Distribution  of  time  to  a completely  debugged  system. 

• Distribution  of  time  to  a specified  number  of  remaining  errors. 

• Distribution  of  number  of  remaining  errors. 


• Expected  number  of  errors  detected  by  time  t . 
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Also,  the  reliability  function  at  the  kth  stage,  i.e.  between  the 
(k-l)st  and  kth  failures,  is  obtained  as 

v*1  <i-5) 


j-0 


where 


Fn(x)  = 1 > Fn(x)  = e 


-Nix 


(1.6) 


In  the  following  sections  we  will  use  these  results  for 
statistical  inference  about  N,  p and  X . 


I 


2.  MAXIMUM  LIKELIHOOD  METHOD 

In  this  section  we  use  the  method  of  maximum  likelihood  to 
draw  inferences  about  N,  p and  \ based  on  available  data  (t,^) 
for  software  errors.  Here,  * is  the  vector  of  times  between  soft- 
ware failures  while  y is  a vector  of  y^'s  where 

/ 1 , if  the  ith  failure  is  caused  by  an  error  due  to 
y^  = < imperfect  debugging, 

\ 0 , otherwise  . 

It  should  be  noted  that  we  make  use  of  the  data  (t:,y)  because  the 
process  X(t)  , the  number  of  remaining  errors  at  time  t,  is 
unobservable.  Also,  such  data  can  be  available  from  actual  software 
error  reports. 

2.1  Likelihood  Function  and  MLE's 

As  pointed  out  above,  the  state  of  X(t)  cannot  be  observed. 
However,  we  note  that  the  sequence  of  error  corrections  forms  a 
sequence  of  Bernoulli  trials.  Suppose  that  ( i— 1 ) failures  have 
been  observed  and  the  ith  failure  has  not  occurred  yet.  Then  the 
number  of  errors  eliminated  up  to  now  is  distributed  as  a binomial 
distribution  with  parameters  (i-l,p)  and  its  expectation  is  p(i-l). 
Also,  the  expected  number  of  errors  occurred  due  to  imperfect 
debugging  is  q(i-l).  Since  the  initial  number  of  errors  is  N , 
the  expected  number  of  errors  remaining  in  the  software  at 
this  stage  is  given  by  N - p(i-l). 
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Therefore,  the  distribution  of  tho  time  between  (i-l)st  and  ith 
failures  is  given  by 

-(N-p(i-l) J X t, 

f (tiIN,p, X)  - (N-p(i-l)  ) X e , i-1, 2 n (2.1) 

where  n is  the  number  of  observed  software  failures. 

Then  the  likelihood  function  for  a given  t is 


L,  (N.p.XIt)  - n f(t  IN , p, X ) . 
1 i-1  1 


(2.2) 


The  next  (ith)  error  will  occur  randomly  from  the  remaining 
( N—p ( i— 1 ) J errors  and  hence  the  probability  of  this  error  being  due  to 
the  imperfect  debugging  category  is  q ( i-1 ) /| N-p ( i- 1 ) I . Therefore, 
the  distribution  of  is 

y l-y 

KY^m.p)  ■ l(,?pH-iyi  liSpfe^rl  i-1-2 " I2-2' 


where 


y^  - 0 or  1 


Then,  the  likelihood  function  for  given  y is 


Lj (N.piy)  - n P(Y<-y, IN, p) 
i-1 


(2.4) 


Due  to  the  independence  of  t and  y,  the  likelihood  function  of 


N,  p,  X can  be  written  as 


L(N,p,Xlt,y)  - L1(N,p,X It) *L2 (N,p,X ly)  . 


(2.5) 


Now  wo  choose  ft,  p and  X which  maximize  (2.5).  Maximizing 
L(N,p,\lt,y)  implies  maximizing  the  log  likelihood  function.  tel 
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I (N,p,\  n.y)  - log  i.(N,p,Ujt.y) 


n n 

- n log  V - V i IN-p(t-l)  It  i ♦ 5'  y4logq(i-l) 
i-l  1 l-l  1 

n 

A )'  ( 1-y  ) log)  N-  ( i-l ) I . 
i-l  1 


Then  ft,  p mwl  \ mv\*»t  nntlnfy 


- 0 
AN 


- 0 

op 


a<  - o 


n l-y 


W*  ‘ il.» 


V 5 ( l-Ut  - V y , q 

l-l  ‘ (-1 


n v - 5 l N-p(  i-  1 1 It t 

i-l 


8 imul  t »n«»ou»  non- It  non  i equal  ton#  (-'.HI,  (-*.'*)  nml  (*.10)  onn  l>» 

•olvptl  by  numet toal  method#  a#  >le#oi  i*wnl  I'i'low,  Kvom  (.'.lot  wo 


V - n / Y IN-p(i-l) It  . 
t-l  1 


Suh# t (tut  v n>j  (.’.III  into  (.'.01  niv»t  (.'.'»),  wo  v|ot 


n 

i n V t . 

•'  i-v*  t-l  1 

f(N.p)  - V N_(i „ 

V l N-p(i- l) I t , 


- 0 , 


qn  £ (i-l)ti  n 

<p(n,p)  = — s yi  = 0 

£ (N-p(i-l) Jt . 1=1 

i=l  1 


(2.13) 


We  now  apply  the  Newton-Raphson  method  [4]  for  solving  the  two 
simultaneous  non-linear  equations  (2.12)  and  (2.13).  Thus,  for 
given  initial  values  Nn  and  pn  of  N and  p the  values  of  the 


first  approximations  are 


where 


N!  " N0  + h 


Pi  = Pn  + k 


foVo3.fi 


fN,0 "p.O  " 'Vo  fp,0 


fN.O  q,0~cpN.0f0 
EN,0  ^,0  “ cpN,  0 fp,0 


(2.14) 


(2.15) 


(2.16) 


(2.17) 


f0  5 f(N0'p0)  ' 


N,°  dNlN=N0,p=P0 
f = ^ 

P'  bP  N=N0  ' P=P0 


(2.18) 


*0  ' 0 and  0 are  similarly  defined. 

The  values  of  N and  p are  successively  modifiei  until 
equations  (2.12)  and  (2.13)  are  satisfied  to  a defined  accuracy; 
such  values  being  the  estimates  ft,  p . Finally,  we  get 
A by  substituting  N and  p 


7 


into  (2.11).  These  are  the  maximum  likelihood  estimates  (MLE's) 
of  N,  p and  x for  given  data  t and  y . 

2.2  Likelihood  Contours 

The  log  likelihood  surface  for  the  parameters  N,  p and  X 
is  given  by  (2.6)  as 

n 

l (N,p, X I t,y)  = n log  X - X I (N-p(i-l)Jt. 

i-1  1 

n n 

+ E y.  logq(i-l)+  E ( 1-y . ) log {N- ( i-1 ) j . 
i=l  1 i=l  1 

For  given  t;  and  y , this  defines  a 4-dimensional  surface 
as  a function  of  N,  p and  X . The  maximum  value  of  this  log 
likelihood  is  obtained  when  N = ft,  p = p and  X = X i.e. 

lm«.'  1 - *<S.P.il£OC>  • (2-19) 

In  order  to  study  the  nature  of  this  surface,  we  proceed  as  follows. 

Let 

l (N.p, x lt,y)  - P -1 (N,p,x lt,y)  , (2.20) 

where  pal  is  some  constant. 

We  will  investigate  the  nature  of  this  surface  by  fixing 

AAA 

N,  p and  X , one  at— a-time  and  varying  the  other  two  parameters. 

Suppose  we  fix  N = N.  Then,  from  (2.6)  we  have 

n n 

f (P.X)  ® n log  X - X E (N-p(i-l)  J t . + E y . log  q(i-l)  - C , (2.21) 

i-1  1 i=l  1 
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where 

C " p i (N,p,  X lt,y)  - T (1-y . ) log(N- (i-1) J . (2.22) 

i=l  1 

By  fixing  p and  hence  C,  equation  (2.21)  gives  one  contour 
in  the  (p-X'  plane.  To  draw  these  contours  for  each  value  of  p , 
we  choose  several  values  of  p and  solve  (2.21)  numerically  for  the 
corresponding  values  of  X.  These  pairs  (p,\)  give  the  desired 
contour. 

Similarly,  contours  in  the  (N-X)  and  (N-p)  planes  can  be 
obtained  by  fixing  p = p and  X = X , respectively. 

2.3  Confidence  Regions 

In  many  instances  interest  lies  in  studying  the  joint 

100(1-qt)%  confidence  region  for  the  parameters.  For  this  purpose 

we  use  the  property  that  for  large  n the  likelihood  ratio  has  a 
2 

X - distribution.  in  our  case, 

/(N.p,Xlt,y)  -i  (N,p,Xlt,y)  (2.23) 

defines  a 100(l-ar)%  confidence  region  for  N,  p and  X • 

Joint  confidence  regions  for  (p,X),  (N,p)  and  (N,X)  can  be 
obtained  from  (2.23)  by  using  a numerical  method  similar  to  that 

of  Section  2.2.  ] 

Now,  by  writing 

/ (N,p,Xlt,y)  = p i (N,p,Xlt,y)  , p»l, 

| 


wo  npf 


V"  1' 


X j; a “ 2(l-p)i(N.p.Xlt.*)  . 


(2.24) 


Equation  (2.24)  can  ba  used  to  study  the  relationship  between  p 
and  the  confidence  coefficient  (1-e). 

2.4  Asymptotic  Properties 

For  large  sample  size  the  mle's  arc  normally  distributed  i.e. 

(2.25) 


(!)-(©•  ■-) 


as  n -* 


The  variance-covariance  matrix  is  given  by 

-1 

rNN  rNp  rNV> 

'cov  | rpN  rpp  rPX 

r r r 
XN  xp  XX 


where 


For  the  model  under  consideration,  we  have 

n 

rNN  “ 


NX 


<*i«l 


PP 


if  q ■ 0 


(2.26) 


r.,b  ■ -E(d^b)‘  <2’27> 


- z 

1/(N- (i-1) ) (N-p(i-l) ) 

(2.28) 

i-1 

■ r 

- 0 

(2.29) 

PN 

rXN 

. 1 

-r  E 1/ ( N-p ( i— 1 ) J 
x i-1 

n 

(2.30) 

(2.31) 
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(2.32) 


: , - r - -±  z (i-l)/(H-p(i-l)) 

px  xp  X 


rU  ■ • 


(2.33) 


2.5  Illustrative  Example 

We  use  a numerical  example  to  illustrate  the  computations  of 
mle's,  likelihood  contours,  etc.  based  on  the  expressions  derived 
in  Sections  2. 1-2.4.  Since  data  from  actual  software  projects  is 
not  available  in  the  desired  format,  we  use  simulated  data  for 
this  purpose.  A total  of  45  (t^y^)  values  were  simulated  for  N=50 
p = 0.9  and  \ = 0.1  and  are  given  in  Table  2.1.  Details  of  the 
simulation  are  given  in  Appendix  A. 

For  this  data  set,  we  solve  equations  (2.12)  and  (2.13)  by 
applying  the  Newton-Raphson  method  with  initial  values 

Nq  = 46  and  p^  = 1.0  . 

After  six  (6)  iterations,  the  solution  of  (2.12)  and  (2.13),  with  an 
accuracy  of  10~^, is 

N ■ 51.3  and  p = 0.919. 

Substituting  these  values  in  (2.11),  we  get  X = 0.085.  For  these 
mle's  the  maximum  value  of  the  log- likelihood  function  is  given  by 
(2.19)  as 


* v ■ 2 = -16 
max 


Now  we  get  the  likelihood  contours  for  p=l.l,  1.3  and  1.5. 
First  we  fix  N = N=51.3.  Solving  equation  (2.21)  for  various  p, 
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I rrl  -filhJBrr— 1— —ti 


the  contours  are  obtained  as  shown  in  Figure  2.1.  Contours  with 
p = p and  \ * ^ in  the  (N-X)  and  (N-p)  planes,  respectively,  are 
obtained  similarly.  These  are  shown  in  Figures  2.2  and  2.3, 
respectively. 

Now  we  use  equation  (2.24)  to  study  the  relationship  between 

the  confidence  coefficient  (1-a)  and  the  constant  ,>  . For  given  2 , 

and  various  values,  the  coefficients  ( 1-a ) are  obtained  from  the 

2 

X table  such  that  (2.24)  is  tt is  fled.  Thus,  for  our  example  we  have 

x].a  - 2 1 1-6 ) (- lb ) 

2 

and  for  p = 1.1  , the  value  of  (1-a)  from  the  x table  is  0.638. 
Similarly  for  p=1.3,  l-c*=.022  and  for  P=1.5,  l-a=.01  . Plots  of 
confidence  level  vs  P for  I ■ - 10,  - 12 , - 14, - 16, -18  and  -20  are  given 
in  Figure  2.4.  Confidence  levels  corresponding  to  the  value  of  p 

are  also  shown  on  the  contours  in  Figures  2.1,  2.2  and  2.3. 

Finally,  the  asymptotic  distribution  of  (N,p,\)  is  given  by 
(2.25). 

The  estimated  variance-covariance  matrix  for  the  simulated 
data  is 

^35 . 5 -0.0122  -0.0128 

E = -0.0122  2.25  xlO-3  6.08  xlO-4 

cov 

-0. 0128  6.08  x 10'4  6.41 xlO-4^ 

and  the  estimated  correlation  coefficients  are 

^Np  " -°‘15 
PNX  = -0.56 

^ NX  = "°-51 
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CONFIDENCE  COEFFICIENT  * 100 


1.0  l.l  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9  2.0 

p -VALUE 

Figure  2.4  Relationship  Between  the  Confidence 
Coefficient  and  the  Constant  p 
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3.  BAYESIAN  INFERENCE 


In  this  section  we  use  a Bayesian  approach  for  obtaining 
posterior  point  estimates  and  the  highest  posterior  density  (HPD) 
region  for  parameters  N,  p and  X . 


3.1  Prior  Distributions 

The  choice  of  the  prior  distribution  for  a parameter  is 
governed  by  several  factors.  Among  these  are  the  range  of  values 
the  parameters  can  take,  the  nature  of  the  prior  distribution  based 
on  historical  data,  and  case  of  analytical  tractability.  In  our 
case,  the  conjugate  priors  (see  [1]  ) for  N and  x.  are  gamma  dis- 
tributions while  the  conjugate  prior  for  p is  a beta  distribution, 


i.  e. , 


P (N ) a Na~ 1 e“*N 


, N>0 


P(p)  a p’T“1(l-p)0"1  , 0<p<l 


P ( X ) a Xu  ^ e > ^ 


, X>0  . 


(3.1) 


(3.2) 


(3.3) 


For  the  case  when  we  know  very  little  about  the  parameters 
i.e.,  for  .the  case  of  prior  ignorance  we  choose  a = ^ = 0,  (3  = Y = 0 


and  tt  = p = 0. 5 as  proposed  by  Jeffreys  (see  [1])  and  we  have 

p (N ) a 1/N 

p(p) a p"1/2(l-p)"1/2 


(3.4) 


p(X) a 1/X  . 


(3.5) 


(3.6) 


These  are  called  the  non-informative  prior  distributions. 

We  also  assume  the  independence  of  prior  information  about 


N,  p and  X , i.e. 


P (N , p , X)  = p(N)p(p)p(X)  . 


(3.7) 


IL 
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3.2  Joint  Posterior  Distribution 


IT'.  ■ isL..  . 


By  applying  Bayes  theorem  we  obtain  the  joint  posterior  dis- 
tribution of  N,  p and  X for  given  priors  and  the  data  i.e. 


p (N,p, X | t ,y) x p(N,p,X)L(N,p,x|t,y) 


(3.8) 


where  the  likelihood  function  L(N,p,x|t,y)  for  given  t and  y 
is  given  by  equation  (2.5). 

TJet  fl,  p and  X be  the  Bayesian  point  estimates  for  N,  p 
and  X , respectively.  That  is,  the  point  (N,p,X)  is  the  mode  of 
the  joint  posterior  distribution  p (N , p,  X 1 1 , y)  . In  other  words, 
p(N,p,x|t,y)  attains  its  maximum  at  (N,p,X).  Therefore,  N,  p and 


~£r'.ir 


Then  we  get 


n n 1-y 

- V E t.  + £ 


i-i  tl  + itl  n-(T-TT+  N 


+ - o 


n 

E 

i-1 


n-1  o-l 


V Eti-D^-Eyj/d-p)  ♦ ~-{“ 


- 0 


n i 

n/X-  E fN-p(i-l)Ui  + ik~i-(Y  - 0 
i«  l 


(3.13) 

(3.14) 

( 3 . 1*3  \ 


Simultaneous  non- linear  equations  (3.13),  (3.14)  and  (3.15)  can  bo 

solved  by  numerical  methods  discussed  in  Section  2.1. 


3.3  H.r.D.  Regions 

It  is  useful  to  obtain  the  Hayesian  confidence  region  or  H.P.D. 
region  which  gives  the  probability  content  of  a contour  for  the  joint 
posterior  distribution,  p (N , p, V 1 1 ,£)  . As  an  approximat  ion , we  may 
use  the  fact  that  for  large  samples  p(N,p,v|t,y)  tends  to  normality 
(see  Box  and  Tiao  111).  Therefore, 

p(N,p,\|t,y)  ~ 

- 2 log  — , — r * "•  X . • (3.1*6) 

P (N * P# \ 1 1 ,y) 

It  follows  that  the  contour  defined  by 

log  p(N,p,  X | £,£)  » loq  p (N,  p,  1 1 ,y ) - ' ? < ( 3.17) 

encloses  a region  whose  probability  content  is  approximately  (l-a). 
Then  the  100(1 -a)*  H.P.D.  teg  ion  is  given  by 


where 


r (n,p,  \)  - c 


(3.18) 
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n 


f(N,p,X)  - n log  X - £ (N-p(i-l) ] t. 

i-1  1 

n n 

♦ E y . log  ( 1-p)  + E (1-y . ) log(N-  (i-1) } 
i-1  1 i-1 

+ (or-1)  log  N - PN 
+ (^-1)  log  X - IX 

+ (n-1) log  p + (P-1) log (1-p)  (3.19) 

and 

C - f (ft.p.x)  - ^ X3JO  . (3.20) 

Tho  contour  defined  by  (1.18)  can  bo  evaluated  hv  numerical  methods 
as  discussed  in  Section  2.1. 

3.4  Numerical  Example 

To  illustrate  the  computations  for  the  various  quantities 
given  in  Sections  3.1,  3.2  and  3.3  wo  use  the  simulated  data  of 
Table  2.1.  Using  the  non-informative  priors  given  in  equations 
(3.4),  (3.5)  and  (3.6),  the  Bayesian  point  estimates  of  N,  p,  X 
are  obtained  by  solving  equations  (3.13),  (3.14)  and  (3.15)  and  are 

0 - 51.43 
p - 0.927 
X - 0.0836  . 
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The  Bayesian  H.P.D.  region  for  N,  p and  x for  this  data 
set  is  obtained  from  equation  (3.18).  Taking  a - .10,  the  90% 
H.P.D.  region  is  shown  in  Figure  3.1. 

The  50%,  75%  and  90%  Bayesian  regions  for  p and  X when 
N - N are  given  in  Figure  3.2.  Similar  regions  for  N and  x 
(p-p)  and  for  N and  p (X«X)  are  given  in  Figures  3.3  and  3.4, 
respectively. 

It  is  also  useful  to  study  the  shapes  of  the  posterior  dis- 
tributions of  parameters  N,  p and  X . These  are  obtained  by 
fixing  the  other  two  parameters  at  their  Bayesian  point  estimates. 
Plots  of  such  distributions  are  given  in  Figures  3.5,  3.6  and  3.7 
for  N,  p and  X , respectively. 
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the  Data  in  Table 


PARAMETER 


0.6  07  08  0 9 10 

PARAMETER  p 


t 


Figure  3.2  Bayesian  H.P.D.  Regions  with  N=N  Based 
on  Non- in formative  Prior  Distributions 


PARAMETER 


PARAMETER  p 


BAYESIAN  KPD.  X*X 


PARAMETER  N 

Figure  3.4  Bayesian  H.P.D.  Regions  with  X-X  Based 
on  Non- in formative  Prior  Distributions 
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PARAMETER  X 


Posterior  Distribution  of  Parameter 
X when  N-N  and  p*p 


X 


4 . CONCLUDING  REMARKS 

In  this  report  we  presented  two  methods  for  statistical  infer- 
ence of  the  parameters  N,  p and  X for  the  imperfect  debugging 
model.  Using  the  method  of  maximum  likelihood,  expressions  were 
derived  for  the  mle's,  the  likelihood  contours  and  the  confidence 
regions.  A Bayesian  approach  was  used  to  obtain  the  Bayesian  point 
estimates  of  N,  p and  X . Bayesian  H.P.D.  regions  for  these 
parameters  were  also  studied.  Numerical  examples  based  on  sim- 
ulated data  were  used  to  illustrate  these  results. 
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APPENDIX  A 


SIMULATION  OF  DATA  (t,^) 

In  this  Appendix  we  describe  the  procedure  used  for  simulating 
the  data  on  times  between  software  failures  and  the  categories  of 
errors.  Recall  that  t^  denotes  the  time  between  the  (i-l)st  and 
ith  software  failures.  Also,  assuming  that  a software  erpor  can 
be  identified  as  being  the  one  due  to  imperfect  debugging,  whenever 
it  occurs,  we  have 

^ 1 , if  ith  failure  is  caused  by  the  error  due  to 
~ \ imperfect  debugging 
( 0 , otherwise  . 

Therefore,  t^  and  y\  are  the  data  needed  for  statistical  infer- 
ence of  parameters  N,  p and  X in  the  imperfect  debugging  model. 

A flow  chart  for  simulating  these  data  is  given  in  Figure  A.l. 
First  we  initialize  the  parameters  N,  X,  p,  I (software  failure 
number),  NR  (number  of  remaining  errors)  and  El  (number  of  errors 
due  to  imperfect  debugging.  Then  a random  number  RN  which  is 
uniformly  distributed  over  (0,1)  is  generated.  Now,  from  equation 
(2.1),  the  random  variable  T^  has  an  exponential  distribution 
with  parameter  NR- X ; i.e. 

-NR- X-t. 

FT. (ti)  = 1 " e 1 • (A.l) 
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[nitial ization 


Parameters  (N,p,A.) 

s/w  failure  number  I «-  0 

No.  of  remaining  errors  NR «-  N 

No.  of  errors  due  to  imperfect  debugging  El  4- 0 


Increase  No.  of  failures  by  one 
I «-  1+  1 


NR  = 0 


Generate  random  number  (RN) 


Generate  time  to  I*'*1  failure 

A t \ 1 1 _ _ I 1 \ 


NR  x \ 


log (1-RN) 


Generate  RN 


RN  < qV!i 


[ Increase  No. 

of  errors 

due  to  imperfect 

debugging  by 

one. 

El  4-  El  + 1 

Reduce  No.  of  remaining  errors  by  one 
NR  4-  NR  - 1 


For  some  value  RN 

-NR*  X • t . 

1 - e 1 - RN  (A. 2) 

and  hence  the  simulated  value  of  t^  is  given  by 

ti  - - • log  ( 1-RN)  . (A.  3) 

Next,  we  generate  a new  random  number  RN.  If  this  new  number 
RN  q , the  probability  of  imperfect  debugging,  the  quantity  El 
is  incremented  by  1 and  the  number  of  remaining  errors  remains 
unchanged.  If  RN  > q , the  number  of  remaining  errors  NR  is 
decreased  by  1.  An  error  which  occurs  next  is  selected  randomly. 
For  given  El  and  NR  the  probability  that  an  error  due  to  imperfect 
debugging  is  detected  is  EI/NR.  Hence,  if  for  a still  new  random 
number  RN,  RN  £ EI/NR,  then  we  set  y^  = 1 and  decrease  El  by  1* 
Otherwise,  y^  * 0 . After  repeating  this  procedure  n times,  we 
obtain  the  simulated  data  set  (t,^)  where  t = , t, , . . . , t ) and 

)i  m (y1,y2 n)-  Table  2.1  shows  a data  set  simulated  by  this 

procedure,  where  N = 50,  p = 0.9,  X = 0.1  and  n = 45. 
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